EGT modelling of cooperation in classroom with LLM
Modelling cooperation amongst peers
The model examines the relationship between peers in a learning environment providing feedback on each others work. In this model the agents are students which we will refer to as learners, and interactions are between learners choosing to provide and receive feedback from each other. We use \(P\) to indicate the learner providing feedback, and \(R\) the learner receiving feedback.
The strategy of interacting with other learners is outlined initially with a single parameter:
\(\Theta \in[0,1]\): Level of cooperative effort - that is how much effort a learner puts into providing feedback to another learner. The more effort towards cooperation the more benefit a peer receives from the feedback, and the more cost is incurred by the provider. \(\Theta_P\) indicates the collaborative effort of the learner providing the feedback, and \(\Theta_R\) the collaborative effort of the incoming feedback learner receiving the feedback.
The environment is balanced by three parameters that guide the payoffs of the different strategies:
\(c\) - Cost of cooperation
\(f\) - Benefit to yourself from others providing feedback / cooperating with you.
\(b\) - Benefit to yourself for providing others with feedback (i.e. you learn by providing feedback)
\(a\) - Benefit to yourself if you are cooperating with someone with the same strategy as you (i.e. a social benefit, gravitation towards similar learners).
The model initially explores the scenario \(c=4,f=5, b=2,a=1\), and will generally assume that \(f > c\). We explore varying strategies and varying the value of \(b\).
In this framework if a player with effort strategy \(\Theta_R\) receives feedback from someone with effort strategy \(\Theta_P\) they gain some perceived value, \(V(R|P)\) (read as value to reciever \(R\) when they meet provider \(P\)), out of sharing feedback with a peer:
\[V(R|P) = \Theta_P f + \Theta_R b - \Theta_Rc +a[RP]\]
Where \(a[RP]=a\) if \(\Theta_R=\Theta_P\), and zero otherwise. This amounts to a preference for working with like-minded peers. Alternatively \(a[RP]\) can specified as \(a(1-|\Theta_R-\Theta_P|)\), which is fine mathematically but still requires some thinking.
\[
V(R|P) = \Theta_P f + \Theta_R b - \Theta_Rc +a(1-|\Theta_R-\Theta_P|)
\]
So the learner gains (or at least perceives to gain) value from the knowledge from the feedback at a level dependent on the providers effort (\(f\Theta_P\)), as well as from providing feedback to them depending their own effort towards cooperation (\(b\Theta_R\)). They incur a cost relative to the effort they put into cooperation (\(c\Theta_R\)).
Within this framework there are three strategies that agents can take that we will examine:
\(C\): Cooperation, where cooperative effort is highest, \(\Theta_C=1\)
\(T\): Token-effort, where cooperative effort is half-way, \(\Theta_T=0.5\)
\(F\): Free-rider, with no cooperative effort, \(\Theta_F=0\)
To introduce genAI we provide two new strategy parameters, indicating the use of genAI for providing feedback to others, or using genAI to supplement feedback a learner receives:
\(\Pi\in\{0,1\}\): The use of genAI to provide feedback (1) or not (0). \(\Pi_P\) indicates the choice of the provider of feedback.
\(\Gamma \in \{0,1\}\): The use of genAI to analyse your own work (1) or not (0). \(\Gamma_R\) indicates the choice of the receiver.
Additionally, there are two new environment parameters:
\(q\): The quality of the genAI for feedback.
\(k\): The reduced cost of providing feedback using AI.
We use two values, \(q=0.8\) to indicate slightly worse than what the peer would provide (with full effort), or \(q=1.25\) for slightly better than what you would expect from peers. We use \(k=0.1\) to indicate a low cost (compared to providing the feedback yourself).
This adds four new strategies of genAI use (note that the paper only introduces one)
\(N\): No genAI use, \(\Pi=0,\Gamma=0\).
\(S\): Using genAI for only your own work, \(\Pi=0,\Gamma=1\). This incurs an additional cost \(c\times k\), but also provides a new benefit \(f \times q\).
\(O\): Using genAI only to provide feedback for others, \(\Pi=1, \Gamma = 0\). This reduces the cost of providing feedback to \(c \times k\) (as \(0<k<1\)), but also changes the value of the feedback to \(f \times q\) instead of \(f \times \Theta_P\).
\(B\): Using genAI for yourself and others, \(\Pi=1,\Gamma=1\). This combines the effects of \(S\) and \(O\).
We then combine the cooperation strategies and genAI strategies. So \(CN\) indicates that the learner is fully cooperating but not using genAI for themselves or for providing feedback. \(FS\) would indicate that a Free-rider (no effort towards cooperation) is using the genAI for their own feedback. Note that the strategies \(FO\) and \(FB\) do not make sense - a Free-rider is not providing feedback for others so would not bother using genAI for it. They might move towards \(TS\) or \(TB\) however.
The new, extended value calculation is more complicated:
It takes a bit of looking, but the above formula, for \(\Gamma_R=\Pi_R=\Pi_P=0\) this formula reduces to the non-genAI scenario (note that \(\Gamma_P\) is irrelevant for calculating the receivers payoff).
In the paper, the parameter \(\Gamma\) is omitted - it is assumed that everyone is using the same strategy for their own GenAI feedback and that this is balancing out.
Furthermore, although this exploration chooses specific values of \(\Theta\), \(\Pi\) and \(\Gamma\), the model allows for continuous versions of these strategies.
Instantiating the game
The game is instantiated in the code below:
Show the code
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.0 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Show the code
# library(EvolutionaryGames)get_payoff <-function(# Strat_r: Receiving strat, Strat_p: Providing strat Strat_r, Strat_p, # expects rows of data frame with columns Eff, AIP, AISp =list(# Parametersc =4, # costf =5, # receive benefit - c <> b_r might depend on knowledge gap??b =2, # self benefitq =0.8, # LLM is 0.8 of decent feedbackk =0.1, # reduces cost of giving fb by thisa =1 )) {# Relabelling to match model formula Theta_R = Strat_r$Eff Theta_P = Strat_p$Eff Gamma_R = Strat_r$AIS Pi_R = Strat_r$AIP Pi_P = Strat_p$AIP f <- p$f b <- p$b c <- p$c a <- p$a q <- p$q k <- p$k V.f <- (1-Pi_P + q * Pi_P ) * Theta_P + Gamma_R * q V.bc <- ((1- Pi_R) + Pi_R * k) * Theta_R V.a <-1-abs(Theta_R - Theta_P) V = V.f * f + V.bc * (b - c) + V.a * a + Gamma_R * (q * f - k * c)return(V)}
Comparing Free-rider, Token (half effort) and Cooperator.
Initially with \(c=4,f=5,b=2, a=1\).
Show the code
# Model params p =list(c =4, # costf =5, # receive benefitb =2, # self benefitq =0.8, # No value from AIk =0.1, # Cost of using AIa =1# assortment )glimpse(p)
List of 6
$ c: num 4
$ f: num 5
$ b: num 2
$ q: num 0.8
$ k: num 0.1
$ a: num 1
The overall fitness of the population is given, for proportion of \(p\) cooperators, by \(W(p)=2p^2+2p(1-p)=2p\), so we want to have \(p\) as high as possible. But from the point of view of the individual it is much nicer to choose \(FN\).
We can also include TN to see how the dynamics evolve over time.
From this we can see that a population evenly mixed between the three strategies (in the centre of the triangle) would move away from the cooperative strategy \(CN\) and slowly veer towards a population of free-riders, \(FN\).
Introducing AI
Three plots each for lower AI quality and higher AI quality. Many combos, so assumptions about behaviour:
For any strategy \(X\), \(E[V(XS)]>E[V(XO)]>E[V(XN)]\), so genAI strategies of none, \(XN\), or others only, \(XO\), are ignored. This is the high cost benefit ratio of using the genAI for your own feedback. Maybe a supplementary proof??
\(FB=TB=CB\), so these will be treated as \(FB\). Note that we have used \(FB\) as this is not really putting in effort to feedback - the genAI is being used to provide the feedback.
Change in thinking: given the fixed benefit of genAI for self, it is easier communicate the results looking at no genAI for self (effectively dropping the \(\Gamma\) parameter).
The idea here is to generate a data set that might match, partially, with what is observable in a classroom setting.
Potential tangible data
The change in cooperation. So not the exact mix of strategies, but the overall direction of total / average cooperative effort.
Potentially individual cooperative effort, with high error (i.e. number of peer feedbacks provided - probably no way to tell between T and C strategies though)
An estimate of GenAI use.
A given number of interactions - these changes are observed over N numbers of peer feedback interactions.
Academic improvement. Under the assumption that more benefit from more effort in the peer feedback process.
Intangible data to generate
Effectively we can generate a simulated prior distribution of data to draw from in a Bayesian framework.
Fixed for data generation: - Population size (number students), \(N_S=100\) - Number of peer feedback interactions, \(N_F=5\) - Chance of switching to partners strategy, if it resulted in a higher payoff, \(\psi=0.5\) - Chance of changing strategy (mutation): \(\mu = 0.01\) to any available strategy. - Not using, but it would be interesting to have a parameter that is a chance that they choose the same partner for the next feedback round.
Prior distributions - Prior distribution of environment parameters (could be informed by expert). Note that fixing one is fine (e.g. fix c = 4) as this is just scaling. - Prior distribution of strategies (probably uniformed) - Prior distribution of AI / no AI - Initial distribution of strategies, C, T, F, for -N and -AI/O
Simulation - Run simulation on \(N_F\) interactions, tracking changes in distribution of strategies
Computed metrics - Overall fitness of population at each step (average payoff) - Average cooperative behaviour (\(\bar{\Theta}\)) - Average AI use (\(\bar{\Pi}\))
Show the code
N_S <-100# number of studentsN_F <-5# number of feedback iterations (games played)switch_p <-0.5# chance of changing strategy, if feedback partner benefited morestrategies <-tibble(Label =c("CN", "TN", "FN", "CO", "TO"),W =c(2,1,1,2,1))# testing# strategies <- # tibble(Label = c("CN", "FN"),# W = c(1,1))play_game <-function(list_of_learners) {# Takes a list of learners, randomly pairs them# computes payoffs to each learner N <-length(list_of_learners) pairs <-sample(1:N) Ri = pairs Pi =c(pairs[(N/2+1):N], pairs[1:(N/2)])tibble(id = pairs,R = list_of_learners[Ri], P = list_of_learners[Pi]) |>left_join(payoff_df_long, by =c("R", "P")) |>arrange(id)}run_sim <-function( p, strategies, # strategies should include Label and initial W.switch_chance =0.5, # will consider changing strategy mutation_chance =0.02, # extra weight added to ALL strategies but weighted by the average fitness of the last round - allows for mutationN =100,runs =5) {# running the sim strat_weights <- all_strategies |>inner_join(strategies, by ="Label") payoff_df_long <<-fetch_payoff_matrix(strategies$Label, p, as_df = T) |>rename(R =`Payoff to strategy:`) |>pivot_longer(-R, names_to ="P", values_to ="V") payoff_df_long <<-inner_join( payoff_df_long, payoff_df_long |>mutate(oldR = R, oldP = P) |>select(-R, -P) |>select(R = oldP, P = oldR, oppV = V),by =c("R", "P") ) |>mutate(opponent_better_off = oppV > V) learners <-sample(strat_weights$Label, size = N, replace = T, prob = strat_weights$W) k <-0 sim.results <-tibble(id =1:length(learners),strategy =factor(learners, levels = strat_weights$Label)) |>mutate(step =0, V =0, opponent_better_off =FALSE)while (k<=runs) { game_result <-play_game(learners) k <- k +1# add data sim.results <-bind_rows( sim.results, game_result |>mutate(step = k) |>select(id, step, strategy = R, V, opponent_better_off) )# New weights - based on whole population fitness new_strat_weights <- game_result |>group_by(R) |>summarise(V =mean(V)) |>rename(strategy = R)# new strategies learners <- game_result |>mutate(probs =runif(nrow(game_result)),mutation_strat =sample(new_strat_weights$strategy, nrow(game_result),replace = T,prob =1+ new_strat_weights$V -min(new_strat_weights$V))) |>mutate(mutate = probs < mutation_chance,switch = probs > (1- switch_chance) & opponent_better_off) |>mutate(strategy =case_when( mutate ~ mutation_strat,switch~ P,TRUE~ R )) |>pull(strategy) } sim.results <-bind_cols(tibble(f = p$f[[1]], a = p$a, b = p$b, c = p$c, q = p$q, k = p$k, mu = mutation_chance, psi = switch_chance), sim.results)return(sim.results)}summarise_sim <-function(sim.results) { sim.results |>inner_join(all_strategies |>rename(strategy = Label), by ="strategy") |>group_by(f, b, c, a, q, k, mu, psi, step) |>summarise(W =mean(V),Theta_bar =mean(Eff),Pi_bar =mean(AIP),pC =mean(str_detect(strategy, "C")),pT =mean(str_detect(strategy, "T")),pF =mean(str_detect(strategy, "F")),.groups ="drop"# Theta_sd = sd(Eff),# Pi_sd = sd(AIP), ) }delta_of_sim_summary <-function(d) { d.p <- d |>select(f, b, c, a, q, k, mu, psi) |>slice(1) diff.d <- d |>select(W, pC, pT, pF, Theta_bar, Pi_bar)bind_cols(d.p,slice(diff.d, nrow(diff.d)) -slice(diff.d, 2) )}
Running a single sim, and computing the changes in:
Overall fitness \(W\)
Proportions of \(C\), \(T\) and \(F\) strategies
Average cooperation effort \(Theta_bar\)
Average use of GenAI \(Pi_bar\)
Show the code
run_sim(p =list(c =4, # costf =5, # receive benefitb =4, # self benefitq =0.8, # LLM is 0.8 of decent feedbackk =0.1, # reduces cost of giving fb by this, but also reduces self benefita =1# assortment / social incentive to align strategies ),strategies =tibble(Label =c("CO", "TO", "FN", "CN", "TN"),W =c(1,1,2,1,1)),switch_chance =0.2, mutation_chance =0.6) |>summarise_sim() |>delta_of_sim_summary()
# A tibble: 1 × 14
f b c a q k mu psi W pC pT pF
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5 4 4 1 0.8 0.1 0.6 0.2 0.66 0.1 0.08 -0.18
# ℹ 2 more variables: Theta_bar <dbl>, Pi_bar <dbl>
This can then be repeated for a range of parameter values, and the results combined to form a joint distribution of these parameters
We can now (with a small number) view part of the joint distribution, say between the overall cooperation fitness (\(W\)) and the level of cooperative effort (\(\bar{\Theta}\)) and GenAI use (\(\bar{\Pi}\))
Show the code
d.results |>ggplot(aes(x = pC + pT, y = W, color = Pi_bar)) +geom_vline(xintercept =0) +geom_hline(yintercept =0) +geom_point(size =3, alpha =0.8) + viridis::scale_color_viridis(name ="Change in GenAI use") +theme_minimal() +xlab("Change in Cooperator proportion") +ylab("Change in average feedback value") +theme(legend.position ="bottom")
Or look at the correlations between the simulated data:
Other thoughts: - Computed aggregation of those strategies - mean, sd, of key parameters. These are not observable but might be more easily relatable to compute changes - such as a move towards more cooperation - Distribution of environment parameters: these might be informed by expert as a part of prior elicitation perhaps - Aggregation of environment parameters: easier to understand? so b-c instead of just b, - Computation of N runs of simulation. - Evolution of strat